Sampling strategies for information extraction over the deep web
نویسندگان
چکیده
منابع مشابه
Sampling strategies for information extraction over the deep web
Information extraction systems discover structured information in natural language text. Having information in structured form enables much richer querying and data mining than possible over the natural language text. However, information extraction is a computationally expensive task, and hence improving the efficiency of the extraction process over large text collections is of critical intere...
متن کاملSampling the National Deep Web
A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively unexplored and even major characteristics such as number of searchable databases on the Web or databases’ subject distribution are somewhat disputable. In this paper, we revisit a problem of deep Web characterization: ho...
متن کاملSampling Strategies for Information Goods∗
This paper analyzes optimal decisions concerning the size of the sample and the price of the paid content for online publishers of digital information goods when sampling serves the dual purpose of disclosing content quality and generating advertising revenue. We show in a reduced-form model how the publisher’s optimal ratio of advertising revenue to sales revenue is linked to characteristics o...
متن کاملSampling, information extraction and summarisation of Hidden Web databases
Hidden Web databases maintain a collection of specialised documents, which are dynamically generated in response to users’ queries. The majority of these documents are generated through Web page templates, which contain information that is often irrelevant to queries. In this paper, we present a system designed to detect and extract query-related information from documents sampled from database...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Processing & Management
سال: 2017
ISSN: 0306-4573
DOI: 10.1016/j.ipm.2016.11.006